Goto

Collaborating Authors

 significance statement


Science Out of Its Ivory Tower: Improving Accessibility with Reinforcement Learning

Wang, Haining, Clark, Jason, McKelvey, Hannah, Sterman, Leila, Gao, Zheng, Tian, Zuoyu, Kübler, Sandra, Liu, Xiaozhong

arXiv.org Artificial Intelligence

A vast amount of scholarly work is published daily, yet much of it remains inaccessible to the general public due to dense jargon and complex language. To address this challenge in science communication, we introduce a reinforcement learning framework that fine-tunes a language model to rewrite scholarly abstracts into more comprehensible versions. Guided by a carefully balanced combination of word- and sentence-level accessibility rewards, our language model effectively substitutes technical terms with more accessible alternatives, a task which models supervised fine-tuned or guided by conventional readability measures struggle to accomplish. Our best model adjusts the readability level of scholarly abstracts by approximately six U.S. grade levels -- in other words, from a postgraduate to a high school level. This translates to roughly a 90% relative boost over the supervised fine-tuning baseline, all while maintaining factual accuracy and high-quality language. An in-depth analysis of our approach shows that balanced rewards lead to systematic modifications in the base model, likely contributing to smoother optimization and superior performance. We envision this work as a step toward bridging the gap between scholarly research and the general public, particularly younger readers and those without a college degree.


Simplifying Scholarly Abstracts for Accessible Digital Libraries

Wang, Haining, Clark, Jason

arXiv.org Artificial Intelligence

Making science more accessible remains a challenge even with much effort devoted on the producer and publisher side. As content producers, researchers are encouraged to engage directly with the public, either through social media (Davies, 2008; Hara et al., 2019; Knox and Hara, 2021) or by crafting more digestible manuscripts in research (Maurer et al., 2021) and practice (Grene et al., 2017). Funding agencies and renowned journals also encourage the communication of scientific findings in accessible language. For instance, the National Institutes of Health (NIH) advocate "clear and simple" principles when communicating with audiences with limited health literacy, and the Proceedings of the National Academy of Sciences of the United States of America (PNAS) requires authors to submit a significance statement accessible to non-experts (Berenbaum, 2021; Pool et al., 2021). As scientific research progresses with increased specialization and interdisciplinarity, it is acknowledged that the use of jargon effectively reduces communication costs among domain experts, particularly those responsible for reviewing submissions. This specialized language, however, can become incomprehensible to those without a similar research background. While efforts to share scientific findings in more accessible language from the producer side are gaining traction, widespread adoption is unlikely in the near future due to the inherent conflicts between the specialized nature of scholarly communication and the public-oriented dissemination of scientific findings. Within this effort to create understandable research findings and open science to broader communities, libraries--and our digital libraries in particular--have a role to play. Driven by this idea, we propose to start by improving the readability of abstracts from scholarly works through automated rewriting.


From Complexity to Clarity: How AI Enhances Perceptions of Scientists and the Public's Understanding of Science

Markowitz, David M.

arXiv.org Artificial Intelligence

This paper evaluated the effectiveness of using generative AI to simplify science communication and enhance the public's understanding of science. By comparing lay summaries of journal articles from PNAS, yoked to those generated by AI, this work first assessed linguistic simplicity across such summaries and public perceptions. Study 1a analyzed simplicity features of PNAS abstracts (scientific summaries) and significance statements (lay summaries), observing that lay summaries were indeed linguistically simpler, but effect size differences were small. Study 1b used a large language model, GPT-4, to create significance statements based on paper abstracts and this more than doubled the average effect size without fine-tuning. Study 2 experimentally demonstrated that simply-written GPT summaries facilitated more favorable perceptions of scientists (they were perceived as more credible and trustworthy, but less intelligent) than more complexly-written human PNAS summaries. Crucially, Study 3 experimentally demonstrated that participants comprehended scientific writing better after reading simple GPT summaries compared to complex PNAS summaries. In their own words, participants also summarized scientific papers in a more detailed and concrete manner after reading GPT summaries compared to PNAS summaries of the same article. AI has the potential to engage scientific communities and the public via a simple language heuristic, advocating for its integration into scientific dissemination for a more informed society.